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Critical Values of Guessing on 
True-False and Multiple-Choice Tests 

Abstract 

Effects of blind guessing on the success of passing true-false 
and multiple-choice tests are investigated under a stochastic 
binomial model. Critical values of guessing are thresholds which 
signify when the effect of guessing is negligible. By checking a 
table of critical values assembled in this paper, one can make a 
decision with 95% confidence as to whether the correction for 
guessing is necessary for grading a true-false or multiple-choice 
test. This useful tabulation also represents an intermediate 
step toward development of a more comprehensive model for non- 
blind guessing in Baysian statistics. 
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Critical Values of Guessing on 
True-False and Multip3.e-Choice Tests 

Correction for guessing has been a persistent problem in the 
interpretation of true-false and multiple-choice test scores. 
Many authors have maintained that no solution to this problem is 
in sight. Thorndike (1971) pointed out: "Practice in United 
States testing organizations and among test publishers with 
respect to using the correction formula remains divided" (p. 59). 
Payne (1992) concurred: "Researchers have for more than 30 years 
been investigating the problem of whether or not to correct for 
guessing. There is still no definite answer or agreement among 
the experts" (p. 108). 

One approach to the correction for guessing has been to 
investigate the conditions under which the influence of blind 
guessing on the scores of a test is negligible. Sax (1989) 
pointed out that teachers should include more items in tests to 
ignore the effect of guessing. Hopkins and Stanley (1981) 
asserted: "It should be evident that the greater the number of 
options per item, the less likely it is that one will select the 
correct option by chance and, hence, the less the magnitude of 
the weighting of an incorrect response" (p. 149). Most 
researchers agreed that the influence of blind guessing on the 
scores of a test diminishes as the length of a test and the 
number of options per item increase (e.g., Ebel and Frisbie, 
1991; Brown, 1981; and Mehrens and Lehmann, 1984). 
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Nonetheless, when the correction for guessing is ignored , it 
becomes possible that a student may pass a test through guessing, 
Tn terms of statistics, a null hypothesis (H c ) may be formulated 
that the effect of guessing is negligible. The mistake of 
ignoring the role of guessing when the effect of guessing does 
exit is called Type I error. In social sciences, the acceptable 
risk of making Type I error is conventionally set at a = .05. 

Critical value is a statistic that marks the edge of the 
rejection region of H c at a = .05 (Heiman, 1992). In a long true- 
false or multiple-choice test, the probability of obtaining a 
high score through guessing is small (Sax, 1989). The passing 
score of a test is the statistic that controls the risk of Type I 
error. The higher the passing score, the less the risk of 
rejecting the null hypothesis (H 0 ) . The rejection region of H c 
contains scores at which the probability of passing a test 
through guessing is less than 5%. The lowest passing score which 
guarantees a no larger than 5% risk is the critical value of a 
passing score for correction of guessing. By checking whether a 
passing score of a test is higher than the corresponding critical 
value, a decision can be made with 93% confidence as to whether 
the correction for guessing is necessary. Accordingly, although 
no solution to the correction for guessing is in sight, it is 
possible to construct a table of critical values to evaluate the 
effect of guessing. 



ERLC 



5 



5 



Critical Values of Guessing 

Purpose 

The critical value of a passing score is determined by the 
structure of a test and tne stochastic model of guessing in which 
the probability of passing the test through guessing is 
delineated. Nevertheless, no such stochastic model has been 
stressed in educational and psychological measurements yet, 
needless to mention the construction of critical values to meet 
the structure of various tests (Brown, 1981; Mehrens & Lehmann, 
1987). The purpose of this paper is to build a stochastic model 
describing the probability of passing a true-fals or multiple- 
choice test through guessing, and to assemble a table of critical 
values for commonly used standardized tests. 

The application of the table is straightforward. For a toct 
with given total number of items (N) anr the probability of 
guessing an item correctly (p), the table contains a critical 
value (x 0 ) identified from the stochastic model. Based on the 
rationale of hypothesis testing, the correction for guessing can 
not be ignored unless the passing score (x) of the te-:t has been 
set at an x > x 0 level. Hence, the critical value x 0 acts as a 
threshold which determines when the effect of guessing is 
negligible. 

Stochastic Model 

The probability of successfully passing a test through blind 
guessing can be modeled as a coin-tossing process. The head and 
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tail from, the coih-tossing are two events corresponding to the 
success and failure in an item-guessing process respectively • 
Given that an item has n options, the probability of obtaining 
the correct option through blind guessing is 1/n. Because the 
number of options in each item is no less than 2, the probability 
of guessing an answer correctly is no larger than 50% in general. 
Thus, the tossed coins are unbalanced. The relations between the 
number of options in an item and the probability of guessing the 
answer correctly are listed in Table 1. 



Table 1 inserted around here 



The coin-tossing process is an elementary stochastic 
process, and has been readily solved in most math-statistical 
textbooks (Casella & Berger, 1990). Since most commonly used 
standardized tests have the same number of options per item, the 
probability of obtaining a correct answer through guessing does 
not change from item to item. For a test which contains N 
different items, the results of taking the test through blind 
guessing are equivalent to the events of independently tossing a 
set of N identical coins. 

In statistics, a single coin tossing is a Bernoulli trial, 
and the entirety of tossing N coins follows a binomial (N,p) 
distribution with p equal to the probability of head in each 
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trial (Bhat, 1984). Thus, the probability of having x heads in N 
trials is: 

P (X=x\N, p) = O p x ( 1 -p) 

x=0 , 1 , 2 , . . . , N (1) 

The total number of items (N) and the number of options per 
item (n) are structural characteristics of a test. The 
probability of guessing an item correctly (p) is determined in 
Table 1 by the number of options per item (n). The events of 
passing a test include cases in which one obtains a score higher 
than a passing score. Critical value is the lowest passing score 
at which the probability of passing the test through guessing is 
less than .05. Hence, the cumulative probability of obtaining a 
score higher than the critical value through guessing can be 
calculated using Binomial (N, p) distribution. For a given N and 
p, the critical value (x Q ) follows formula (2): 



(2) 



The construction of critical values based on formula (2) 

needs the cumulative sums of terms of the binomial distribution. 

Eisenhart (1949) pointed out: 

Cumulative sums of terms of the binomial distribution can be 
obtained directly from Tables of the Incomplete Beta- 
Function (Edited by Karl Pearson, Biometrika Office, 
University College, London, 1934), but owing to the conflict 
between the notation of the tables and that commonly used 
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for the binomial distribution, the extraction of a binomial 
probability from the tables is particularly difficult on 
each new occasion, and even for continual use requires 
patience and care, (p. IV) 

Ferris ( 1944 ) accorded : 

For sairnle sizes up to 50, generally including the first 
sample size in a double-sampling plan, the required binomial 
values can be read directly from Karl Pearson's compilation. 
However, for second sample sizes above 50 and quality levels 
in the range stated above, no tabulations of any scope were 
available. (p.l) 

Fortunately, the Pearson's table has been converted into 
Tables of the Binomial Probability Distribution by National 
Bureau of Standards (1950) f ^r sample size (N) equal to 1, 2, 

49. Ballistic Research Laboratory (1944) also assembled 
Tables of Binomial Probabilities for N equal to 60, 75, 90, 100, 
150, 200, 250 and 300. According to Burington and May (1970), 
these are the extensive tables of binomial distribution. 

Non-blind guessing can be modeled in a Baysian stochastic 
process. Because a student may have partial knowledge, an infor- 
mative guess could be made in a true-false or multiple choice 
test. Based on the Bayesian statistics, the chance of committing 
the guess can be described in a conditional probability. Given 
the condition that the guess has been made, the probability 
of making a correct guess can be simplified in a binomial 
distribution (Casella & Berger, 1990). Thus, tabulation of the 
simple binomial model represents an indispensable step toward 
development of the more comprehensive guessing model for non- 
blind guessing in Baysian statistics. 
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A Table of Critical Values 
The critical values constructed in this paper are based on 
the two tables congregated by National Bureau of standards (1950) 
and Ballistic Research Laboratory (1944). Because the criterion 
of a = .05 is set in formula (2), not every group of N and p has 
a critical value x 0 . For example, for a test with a small N and 
large p, such as N=4 and p=.5, the probability of obtaining a 
full score through guessing is .0625, a value larger than .05. 
Thus, no matter what the passing score has been chosen, the 
effect of guessing for this test is not negligible at « = .05 

level. The same situation exits for a test with N=3 an~ p=.5, 
or N=4 and p>.22 structures. 

It should be further noted that the score of a multiple- 
choice test, including the passing score, is an integer counted 
on the number of correctly answered items. However, the critical 
value (x 0 ) calculated from formula (2) may not be an integer. 
Fractional values of x 0 are not physically interpretable because 
critical values represent the minimum passing scores which can 
not be achieved by blind guessing at a 5 .05 level. To guarantee 
that the risk probability is no larger than a, the critical value 
(x 0 ) calculated from formula (2) is rounded up to an integer. 
As a result, with a level of passing score higher than the 
critical value, the risk of allowing a student passing a test 
through guessing is less than a. 
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Critical values (x 0 ) of passing scores for commonly used 
standardized tests are listed in Table 2 with the probability of 
guessing an item correctly (p) identified by the number of 
options per item (n) and the length of a test represented by the 
total number of test items (K) • 



Table 2 inserted around here 



The implication of Table 2 is two-fold. First, it has been 
shown that the critical value (x D ) increases as the number of 
options per item (n) decreases. Secondly, while critical values 
(x 0 ) increase along with the length of a test (N>, the ratio x 0 /N 
generally decreases as the N increases. Hence, it is 
demonstrated in Table 2 that the effect of blind guessing 
diminishes as the number of option per item and the length of a 
test increase. 

Discussion 

Kane (1994) pointed out: "The validity of test-based 
decisions about readiness for a course or a profession depends on 
the appropriateness of the passing scores used to make the 
decisions" (p. 425). The critical value of passing score 
presented in Table 2 is an instrument to measure the effect of 
guessing in a limited number of true-false or multiple choice 
tests. For a test with length (N) and the number options per 
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item (n) not listed in Table 2, Formulae (3) and (4) can be 
employed to construct the critical value (Casella & Berger, 
1990) . 

2£. 0 (£)P*(1-P) **-*>' (N-x) O f x ~ p z"-*-i{\-t)*dt 

J ° (3) 

P(X=x) = {N-x+1) p p{x=x . x) 

X 1_P (4) 
Formula (3) is based on an extensive table of Incomplete Beta- 
Distribution. Formula (4) is a recursion equation to augment the 
list of critical values. The reason for using (3) and (4) rather 
than a linear interpolation is that "linear interpolation will 
generally not be accurate to more than two decimal places , and 
sometimes less" (Burington & May, 1970, p. 351). 
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TABLE 1 



Probability of Guessing a Correct Answer to an n-choice Test Item 



Number of Choices 
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3 
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6 


7 


8 


9 


10 


Probability 


.50 


.33 


.25 


.20 


.17 


. 14 


.13 


. 11 


.10 
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TABLE 2 

Critical Value of Passing Score for an N-item, n-Choice Test 
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TABLE 2 (cont.) 



Number of 
items 








Number 


of 


Choices 


(n) 






(N) • 
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TABLE 2 (cont.) 



Number of 
items 
(N) 








Number 


of 
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28 
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29 


28 


24 


23 


200 


113 
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